In concurrency control of databases, transaction processing (transaction management), and other transactional distributed applications, Global serializability (or Modular serializability) is a property of a global schedule of transactions. A global schedule is the unified schedule of all the individual database (and other transactional object) schedules in a multidatabase environment (e.g., federated database). Complying with global serializability means that the global schedule is serializable, has the serializability property, while each component database (module) has a serializable schedule as well. In other words, a collection of serializable components provides overall system serializability, which is usually incorrect. A need in correctness across databases in multidatabase systems makes global serializability a major goal for global concurrency control (or modular concurrency control). With the proliferation of the Internet, Cloud computing, Grid computing, and small, portable, powerful computing devices (e.g., smartphones), as well as increase in systems management sophistication, the need for atomic distributed transactions and thus effective global serializability techniques, to ensure correctness in and among distributed transactional applications, seems to increase.
In a federated database system or any other more loosely defined multidatabase system, which are typically distributed in a communication network, transactions span multiple (and possibly distributed) databases. Enforcing global serializability in such system, where different databases may use different types of concurrency control, is problematic. Even if every local schedule of a single database is serializable, the global schedule of a whole system is not necessarily serializable. The massive communication exchanges of conflict information needed between databases to reach conflict serializability globally would lead to unacceptable performance, primarily due to computer and communication latency. Achieving global serializability effectively over different types of concurrency control has been open for several years. Commitment ordering (or Commit ordering; CO), a serializability technique publicly introduced in 1991 by Yoav Raz from Digital Equipment Corporation (DEC), provides an effective general solution for global (conflict) serializability across any collection of database systems and other transactional objects, with possibly different concurrency control mechanisms. CO does not need the distribution of conflict information, but rather utilizes the already needed (unmodified) atomic commitment protocol messages without any further communication between databases. It also allows optimistic (non-blocking) implementations. CO generalizes Strong strict two phase locking (SS2PL), which in conjunction with the Two-phase commit (2PC) protocol is the de facto standard for achieving global serializability across (SS2PL based) database systems. As a result CO compliant database systems (with any, different concurrency control types) can transparently join existing SS2PL based solutions for global serializability. The same applies also to all other multiple (transactional) object systems that use atomic transactions and need global serializability for correctness (see examples above; nowadays such need is not smaller than with database systems, the origin of atomic transactions).
The most significant aspects of CO that make it a uniquely effective general solution for global serializability are the following:
All these aspects, except the first two, are also possessed by the popular SS2PL, which is a (constrained, blocking) special case of CO and inherits many of CO's qualities.
Contents |
The difficulties described above translate into the following problem:
Lack of an appropriate solution for the global serializability problem has driven researchers to look for alternatives to serializability as a correctness criterion in a multidatabase environment (e.g., see Relaxing global serializability below), and the problem has been characterized as difficult and open. The following two quotations demonstrate the mindset about it by the end of the year 1991, with similar quotations in numerous other articles:
Commitment ordering,[2][3] publicly introduced in May 1991 (see below), provides an efficient elegant general solution, from both practical[4][5] and theoretical[6] points of view, to the global serializability problem across database systems with possibly different concurrency control mechanisms. It provides conflict serializability with no negative effect on availability, and with no worse performance than the de facto standard for global serializability, CO's special case Strong strict two-phase locking (SS2PL). It requires knowledge about neither local nor global transactions.
The Commitment ordering solution comprises effective integration of autonomous database management systems with possibly different concurrency control mechanisms. This while local and global transactions execute in parallel without restricting any read or write operation in either local or global transactions, and without compromising the systems' autonomy.
Even in later years, after the public introduction of the Commitment ordering general solution in 1991, the problem still has been considered by many unsolvable:
The quotation above is from a 1997 article proposing a relaxed global serializability solution (see Relaxing global serializability below), and referencing Commitment ordering (CO) articles. The CO solution supports effectively both full ACID properties and full local autonomy, as well as meeting the other requirements posed above in the Problem statement section, and apparently has been misunderstood.
Similar thinking we see also in the following quotation from a 1998 article:
Also the above quoted article proposes a relaxed global serializability solution, while referencing the CO work. The CO solution for global serializability both bridges between different concurrency control protocols with no substantial concurrency reduction (and typically minor, if at all), and maintains the autonomy of local DBMSs. Evidently also here CO has been misunderstood. This misunderstanding continues to 2010 in a textbook by some of the same authors, where the same relaxed global serializability technique, Two level serializability, is emphasized and described in detail, and CO is not mentioned at all.[10]
On the other hand, the following quotation on CO appears in a 2009 book:[11]
The characteristics and properties of the CO solution are discussed below.
Several solutions, some partial, have been proposed for the global serializability problem. Among them:
The problem of global serializability has been a quite intensively researched subject in the late 1980s and early 1990s. Commitment ordering (CO) has provided an effective general solution to the problem, insight into it, and understanding about possible generalizations of strong strict two phase locking (SS2PL), which practically and almost exclusively has been utilized (in conjunction with the Two-phase commit protocol (2PC) ) since the 1980s to achieve global serializability across databases. An important side-benefit of CO is the automatic global deadlock resolution that it provides (this is applicable also to distributed SS2PL; though global deadlocks have been an important research subject for SS2PL, automatic resolution has been overlooked, except in the CO articles, until today (2009)). At that time quite many commercial database system types existed, many non-relational, and databases were relatively very small. Multi database systems were considered a key for database scalability by database systems interoperability, and global serializability was urgently needed. Since then the tremendous progress in computing power, storage, and communication networks, resulted in orders of magnitude increases in both centralized databases' sizes, transaction rates, and remote access to database capabilities, as well as blurring the boundaries between centralized computing and distributed one over fast, low-latency local networks (e.g., Infiniband). These, together with progress in database vendors' distributed solutions (primarily the popular SS2PL with 2PC based, a de facto standard that allows interoperability among different vendors' (SS2PL-based) databases; both SS2PL and 2PC technologies have gained substantial expertise and efficiency), workflow management systems, and database replication technology, in most cases have provided satisfactory and sometimes better information technology solutions without multi database atomic distributed transactions over databases with different concurrency control (bypassing the problem above). As a result, the sense of urgency that existed with the problem at that period, and in general with high-performance distributed atomic transactions over databases with different concurrency control types, has reduced. However, the need in concurrent distributed atomic transactions as a fundamental element of reliability exists in distributed systems also beyond database systems, and so the need in global serializability as a fundamental correctness criterion for such transactional systems (see also Distributed serializability in Serializability). With the proliferation of the Internet, Cloud computing, Grid computing, small, portable, powerful computing devices (e.g., smartphones), and sophisticated systems management the need for effective global serializability techniques to ensure correctness in and among distributed transactional applications seems to increase, and thus also the need in Commitment ordering (including the popular for databases special case SS2PL; SS2PL, though, does not meet the requirements of many other transactional objects).
Commitment ordering[2][3] (or Commit ordering; CO) is the only high-performance, fault tolerant, conflict serializability providing solution that has been proposed as a fully distributed (no central computing component or data-structure are needed), general mechanism that can be combined seamlessly with any local (to a database) concurrency control mechanism (see technical summary). Since the CO property of a schedule is a necessary condition for global serializability of autonomous databases (in the context of concurrency control), it provides the only general solution for autonomous databases (i.e., if autonomous databases do not comply with CO, then global serializability may be violated). Seemingly by sheer luck, the CO solution possesses many attractive properties:
The only overhead incurred by the CO solution is locally detecting conflicts (which is already done by any known serializability mechanism, both pessimistic and optimistic) and locally ordering in each database system both the (local) commits of local transactions and the voting for atomic commitment of global transactions. Such overhead is low. The net effect of CO may be some delays of commit events (but never more delay than SS2PL, and on the average less). This makes CO instrumental for global concurrency control of multidatabase systems (e.g., federated database systems). The underlying Theory of Commitment ordering,[6] part of Serializability theory, is both sound and elegant (and even "mathematically beautiful"; referring to structure and dynamics of conflicts, graph cycles, and deadlocks), with interesting implications for transactional distributed applications.
All the qualities of CO in the list above, except the first three, are also possessed by SS2PL, which is a special case of CO, but blocking and constraining. This partially explains the popularity of SS2PL as a solution (practically, the only solution, for many years) for achieving global serializability. However, property 9 above, automatic resolution of global deadlocks, has not been noticed for SS2PL in the database research literature until today (2009; except in the CO publications). This, since the phenomenon of voting-deadlocks in such environments and their automatic resolution by the atomic commitment protocol has been overlooked.
Most existing database systems, including all major commercial database systems, are strong strict two phase locking (SS2PL) based and already CO compliant. Thus they can participate in a CO based solution for global serializability in multidatabase environments without any modification (except for the popular multiversioning, where additional CO aspects should be considered). Achieving global serializability across SS2PL based databases using atomic commitment (primarily using two phase commit, 2PC) has been employed for many years (i.e., using the same CO solution for a specific special case; however, no reference is known prior to CO, that notices this special case's automatic global deadlock resulotion by the atomic commitment protocol's augmented-conflict-graph global cycle elimination process). Virtually all existing distributed transaction processing environments and supporting products rely on SS2PL and provide 2PC. As a matter of fact SS2PL together with 2PC have become a de facto standard. This solution is a homogeneous concurrency control one, suboptimal (when both Serializability and Strictness are needed; see Strict commitment ordering; SCO) but still quite effective in most cases, sometimes at the cost of increased computing power needed relatively to the optimum. (However for better performance relaxed serializability is used whenever applications allow). It allows inter-operation among SS2PL-compliant different database system types, i.e., allows heterogeneity in aspects other than concurrency control. SS2PL is a very constraining schedule property, and "takes over" when combined with any other property. For example, when combined with any optimistic property, the result is not optimistic anymore, but rather characteristically SS2PL. On the other hand, CO does not change data-access scheduling patterns at all, and any combined property's characteristics remain unchanged. Since also CO uses atomic commitment (e.g., 2PC) for achieving global serializability, as SS2PL does, any CO compliant database system or transactional object can transparently join existing SS2PL based environments, use 2PC, and maintain global serializability without any environment change. This makes CO a straightforward, natural generalization of SS2PL for any conflict serializability based database system, for all practical purposes.
Commitment ordering has been quite widely known inside the transaction processing and databases communities at Digital Equipment Corporation (DEC) since 1990. It has been under company confidentiality due to patenting[4] [5] processes. CO was disclosed outside of DEC by lectures and technical reports' distribution to database researches in May 1991, immediately after its first patent filing. It has been misunderstood by many database researchers years after its introduction, which is evident by the quotes above from articles in 1997-1998 referencing Commitment ordering articles. On the other hand CO has been utilized extensively as a solution for global serializability in works on Transactional processes, [12] [13] and more recently in the related Re:GRIDiT, [14] [15] which is an approach for transaction management in the converging Grid computing and Cloud computing. See more in The History of Commitment Ordering.
Some techniques have been developed for relaxed global serializability (i.e., they do not guarantee global serializability; see also Relaxing serializability). Among them (with several publications each):
While local (to a database system) relaxed serializability methods compromise serializability for performance gain (and are utilized only when the application can tolerate possible resulting inaccuracies, or its integrity is unharmed), it is unclear that various proposed relaxed global serializability methods which compromise global serializability, provide any performance gain over commitment ordering which guarantees global serializability. Typically, the declared intention of such methods has not been performance gain over effective global serializability methods (which apparently have been unknown to the inventors), but rather correctness criteria alternatives due to lack of a known effective global serializability method. Oddly, some of them were introduced years after CO had been introduced, and some even quote CO without realizing that it provides an effective global serializability solution, and thus without providing any performance comparison with CO to justify them as alternatives to global serializability for some applications (e.g., Two-level serializability[9]). Two-level serializability is even presented as a major global concurrency control method in a 2010 edition of a text-book on databases[10] (authored by two of the original authors of Two-level serializability, where one of them, Avi Silberschatz, is also an author of the original Strong recoverability articles). This book neither mentions CO nor references it, and strangely, apparently does not consider CO a valid Global serializability solution.
Another common reason nowadays for Global serializability relaxation is the requirement of availability of internet products and services. This requirement is typically answered by large scale data replication. The straightforward solution for synchronizing replicas' updates of a same database object is including all these updates in a single atomic distributed transaction. However, with many replicas such a transaction is very large, and may span several computers and networks that some of them are likely to be unavailable. Thus such a transaction is likely to end with abort and miss its purpose.[17] Consequently Optimistic replication (Lazy replication) is often utilized (e.g., in many products and services by Google, Amazon, Yahoo, and alike), while Global serializability is relaxed and compromised for Eventual consistency. In this case relaxation is done only for applications that are not expected to be harmed by it.
Classes of schedules defined by relaxed global serializability properties either contain the global serializability class, or are incomparable with it. What differentiates techniques for relaxed global conflict serializability (RGCSR) properties from those of relaxed conflict serializability (RCSR) properties that are not RGCSR is typically the different way global cycles (span two or more databases) in the global conflict graph are handled. No distinction between global and local cycles exists for RCSR properties that are not RGCSR. RCSR contains RGCSR. Typically RGCSR techniques eliminate local cycles, i.e., provide local serializability (which can be achieved effectively by regular, known concurrency control methods), however, obviously they do not eliminate all global cycles (which would achieve global serializability).